Cloud services are not immune to outages, and the severity and scope of impact
to the customer can vary based on the outage situation. Similar to any
internal IT-supported application, business impact due to a service outage
will depend on the criticality of the cloud application and its
relationship to internal business processes. In the case of
business-critical applications where businesses rely on the continuous
availability of service, even a few minutes of service outage can have a
serious impact on your organization’s productivity, revenue, customer
satisfaction, and service-level compliance.According to the Cloud Computing Incidents Database (CCID), which tracks cloud service outages, major CSPs have
suffered downtime ranging from a few minutes to a few hours. In one case,
a service outage lasted more than 24 hours! Furthermore, depending on the
severity of the incident and the scope of the affected infrastructure,
outages may affect all or a subset of customers. During a cloud service
disruption, affected customers will not be able to access the cloud
service and in some cases may suffer degraded performance or user
experience. For example, when a storage service is disrupted, it will
affect the availability and performance of a computing service that
depends on the storage service.
Figures Figure 1 and Figure 2 show some examples of recent
outages.
In regard to Figure 2, web users across the
globe were reporting outages on myriad Google services, including Gmail, Google News, Google Docs,
Google Calendar, Google Analytics, Google Maps, Google AdSense, and Google
Search. Google acknowledged the problem and says it has been solved,
blaming the traffic slowdown on a routing mistake.
In another example, on December 20, 2005
Salesforce.com (the on-demand customer
relationship management service) said it suffered from a system outage
that prevented users from accessing the system during business hours.
Users “experienced intermittent access” from 9:30 a.m. to 12:41 p.m.
Eastern time and from 2:00 p.m. to 4:45 p.m. Eastern time because of a
database cluster error in one of the company’s four global network nodes,
company officials said in a statement the day following the outage. The
statement added that “Salesforce.com addressed the issue with the database
vendor” so that service could be restored in the afternoon.
Factors Impacting Availability
The cloud service resiliency and availability depend on a few
factors, including the CSP’s data center architecture (load balancers,
networks, systems), application architecture, hosting location
redundancy, diversity of Internet service providers (ISPs), and data storage architecture. Following is a list
of the major factors:
SaaS and PaaS application architecture and redundancy.
Cloud service data center architecture, and network and
systems architecture, including geographically diverse and
fault-tolerance architecture.
Reliability and redundancy of Internet connectivity used by
the customer and the CSP.
Customer’s ability to respond quickly and fall back on
internal applications and other processes, including manual
procedures.
Customer’s visibility of the fault. In some downtime events,
if the impact affects a small subset of users, it may be difficult
to get a full picture of the impact and can make it harder to
troubleshoot the situation.
Reliability of hardware and software components used in
delivering the cloud service.
Efficacy of the security and network infrastructure to
withstand a distributed denial of service (DDoS) attack on the
cloud service.
Efficacy of security controls and processes that reduce human
error and protect infrastructure from malicious internal and
external threats, e.g., privileged users abusing privileges.